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The human immunodeficiency virus type-1 (HIV-1) genome contains multiple, highly conserved structural 
RNA domains that play key roles in essential viral processes. Interference with the function of these RNA 
domains either by disrupting their structures or by blocking their interaction with viral or cellular factors 
may seriously compromise HIV-1 viability. RNA aptamers are amongst the most promising synthetic 
molecules able to interact with structural domains of viral genomes. However, aptamer shortening up to 
their minimal active domain is usually necessary for scaling up production, what requires very 
time-consuming, trial-and-error approaches. Here we report on the in vitro selection of 64 nt-long specific 
aptamers against the complete 5 '-untranslated region of HIV-1 genome, which inhibit more than 75% of 
HIV-1 production in a human cell line. The analysis of the selected sequences and structures allowed for the 
identification of a highly conserved 16 nt-long stem-loop motif containing a common 8 nt-long apical loop. 
Based on this result, an in silico designed 16 nt-long RNA aptamer, termed RNAptl6, was synthesized, with 
sequence 5'-CCCCGGCAAGGAGGGG-3'. The HIV-1 inhibition efficiency of such an aptamer was close to 
85%, thus constituting the shortest RNA molecule so far described that efficiently interferes with HIV-1 
replication. 

The human immunodeficiency virus type 1 (HIV-1) genome is a 9.2 kb long, single-stranded RNA (ssRNA) 
molecule of positive polarity with multiple open reading frames (ORFs), including those for structural and 
functional proteins gag, pol and env. The coding region is flanked at both ends by untranslated regions 
(UTR). HIV-1 infective particles contain two identical genomic RNA molecules. The genome is reverse tran- 
scribed into a cDNA copy that is integrated into the host genome. The transcription of the proviral cDNA 
generates capped, full-length genomic RNAs that must undergo alternative splicing to accommodate the different 
ORFs immediately downstream of the capped 5 '-end. All the HIV-1 genomic and subgenomic RNAs share 
approximately the first 300 nt of their 5' -UTR. This sequence contains several well-characterized and conserved 
functional RNA domains''^ including the trans-activator response (TAR) element', the polyadenylation [poly(A)] 
region**, the primer-binding site (PBS)^ the dimerisation initiation site (DIS)"", the major splicing donor (SD)^ and 
part of the packaging signal (psi)" (Fig. 1). These RNA domains are involved in alternative, functionally relevant 
RNA-RNA interactions that determine two mutually exclusive conformations of the overall 5' -UTR region, 
termed Branched Multiple Hairpins (BMH) and Long Distance Interaction (LDI)'. 

The identification of highly conserved structural-functional elements within viral RNA genomes has attracted 
much attention due to their potential use as targets for novel antiviral drugs. Indeed, small RNA molecules have 
been proved to be efficient inhibitors targeting functional RNA genomic domains (including those at the 5 ' -UTR) 
as well as genetic products of HIV- 1'° ''.Some of the inhibitory RNAs have been included in clinical trials for 
HIV-1 treatment"". This strategy has been also applied to other clinically relevant RNA viruses" 

Aptamers are short RNA or DNA oligonucleotides that, due to their three dimensional structure, can efficiently 
and specifically bind a molecular target^'. Most of the developed aptamers have been artificially obtained using 
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Figure 1 | Schematic representation of the BMH conformation of the HIV- 1 UTR308- Essential structural domains are indicated. Putative binding sites 
for RNAptl6 (boxed) and the in vitro selected aptamers are shown in boldface, and labelled from (a) to (f). The alternative, partially overlapping binding 
region (a') is shown. 



SELEX (Systematic Evolution of Ligands by Exponential enrich- 
ment) techniques'^^, and their potential as therapeutic agents has 
been widely established^^^"^^. A SELEX procedure consists in the 
exposition of a random population of nucleic acid molecules to a 
desired target, in experimental conditions that enable the segregation 
of the molecules bound to the target for a subsequent amplification 
round. The successive selection-amplification cycles progressively 
enrich the population in molecules able to bind the target. Usually, 
the starting population for SELEX consists of molecules with a ran- 
dom sequence flanked by constant regions needed for primer bind- 
ing. This fact imposes a minimal aptamer length, limiting in turn its 
usefulness as a therapeutic agent. Several deletion methods have been 
developed to experimentally obtain the minimal domain that allows 
efficient aptamer binding'^'^'"'^'. Bioinformatic strategies constitute 
an alternative, though less explored approach for this purpose^** 
An example of such a 'rational truncation approach' has been suc- 
cessfully applied to the production of 40-50 nt long functional RNA 
aptamers". We report here the in vitro selection of efficient anti- 



HIV-1 RNA aptamers targeting the viral 5'UTR, and their short- 
ening to a 16 nt-long minimal aptamer by the application of compu- 
tational strategies based on sequence and structure analysis to 
identify and improve core aptamer domains. The combination of 
an in vitro selection with the in silico approach has allowed us to 
design the shortest reported RNA molecule that efficiently inhibits 
HIV-1 production in a human cell line. 

Results 

In vitro selection of RNA aptamers against the 5'-UTR of HIV-1. 

An in vitro selection strategy was used to isolate anti-HIV- 1 RNA 
aptamers targeting the first 308 nt of the 5'-UTR of the HIV-1 
subtype B NL4.3 genome. This target corresponds to the sequence 
fragment of the 5'UTR that is common to all genomic and 
subgenomic HIV-1 RNAs^, including 8 nt from the unspliced SD 
domain that have been maintained in the molecule to prevent 
alternative foldings of the entire molecule due to SD partial 
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deletion after splicing. The starting RNA population consisted of 
roughly 7 X 10'^^ variants of 64 nt long molecules. RNA molecules 
were binding- challenged against the target molecule fixed to a 
sepharose-streptavidin column. Fourteen cycles of selection- 
amplification were performed, and the selective pressure was 
stepwise modified by increasing the binding temperature from 
round IV on, and by reducing the target: aptamer ratio from round 
XI on. The yield of complex formation increased along the selection 
process, as shown by the evolution of Bj^j^''' (Fig. 2). A total of 299 
individual clonal sequences derived from the populations selected at 
rounds 0 (initial random population, 30 seqs), I (30), III (31), V (24), 

VIII (28), IX (32), X (35), XI (52) and XIV (37) were analyzed (Fig. 3 
and data not shown). The presence of intra- and/or inter-round 
repeated sequences supposed a reduction in the diversity of the 
analyzed collection to 216 different molecules (Table SI). The 
repeated sequences gained representation from round IX on, thus 
showing that the selection process was effective. Finally, 188 out of 
the 216 different sequences showed the expected length of 64 nt, 
while the others contained deletions of one or more nucleotides in 
their variable region. The clustering of those 188 different, 64 nt-long 
sequences, is shown in Figure SI. 

Sequence analysis showed that three clonal sequences from round 

IX (9.4% of the population), 15 from round X (42.9%), 35 from round 
XI (67.3%), and 33 from XIV (89.2%) shared the consensus octamer 
5'-GGCAAGGA-3' (with a point mutation in sequence XIV32: 5'- 
GGCAGGGA-3'). Interestingly, all main groups of repeated 
sequences in rounds XI and XIV contained the consensus octamer. 
Two groups were particularly relevant at rounds XI and XIV: i) 
Group 1, formed by clone 21 and other 6 repeated sequences of 
round XI (termed XI2 1-7) identical to 6 repeated sequences of round 
XIV (termed XIV26-6); and ii) Group 2, constituted by 17 sequences 
of round XI (termed XII -17) identical to 23 repeated sequences of 
round XIV (termed XIV22-23). The latter group was first observed in 
round IX (see Table SI). Thus, our results indicate that the selection 
procedure has been successful in reducing the initial variability to 
two major sequences that together represent the 51% and 80% of the 
total population in rounds XI and XIV, respectively. Sequence com- 
parison showed that the hexanucleotide 5'-GGCAAG-3' within the 
consensus octamer is complementary to the apical loop of the 
poly(A) domain within the repeated region (R) located at both 5' 
and 3'-UTRs of HIV- 1 (Fig. 1 and 3), thus suggesting the putative 
relevance of this recognition site for the selected aptamers. 

The remaining sequences in rounds XI and XIV can be grouped 
into two additional groups of molecules, defined by common 
sequence motifs complementary to the TAR apical loop and the 
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Figure 2 | Changes in relevant variables along the selection process. 

Average distances were computed over all possible sequence/structure 
pairs at each round, and error bars show the mean square deviation of each 
sample. In B,^^^, error bars represent the experimental error. 



SD apical loop of HIV- 1 5'-UTR, respectively (Fig. 3). Additionally, 
certain unique sequences were identified, which did not show a clear 
sequence complementarity to any domain within the target sequence 
(Fig. 3). 

In silico sequence and structure analysis. To study the effect of 
selection along the process, we measured the similarity of the 
clonal sequences within each round. The distributions of the 
Hamming distance between pairs of sequences at rounds 0, 1 to IX, 
X to XIV, and I to XIV is shown in Figure S2. Since a different 
number of sequences were available for each round, the distribu- 
tions are rescaled by the factor M(M- 1)/2, with M being the 
number of sequences per round. For rounds 0-IX, the distributions 
are relatively symmetric, and peaked between 1 7 and 1 9. This value is 
close to the average Hamming distance between random sequences: 
25*3/4 = 18.75, where 25 is the length of the variable region and 3/4 
is the probability that a nucleotide is different from other randomly 
chosen nucleotide. In turn, for rounds XI-XIV the distribution 
showed several peaks at varying distances, and groups of equal 
(zero Hamming distance) and very similar (distances 1 and 2) 
sequences were present in the pool. Note that the height of the 
peak is proportional to the absolute number of repeated sequences 
within a group. The comparison of the similarities among secondary 
structures along the process yields qualitatively analogous results 
(Figure S3). A simultaneous comparison of the similarity between 
sequences and the corresponding structures is shown in Figure S4. 
That representation highlights the known fact that similar sequences 
can fold into significantly different minimum free energy (MFE) 
structures, and vice versa, identical structures may arise from 
significantly different sequences. 

Regarding the secondary structures of the folded sequences, the 
evolution of two thermodynamic parameters, the ensemble diversity 
(ED) and the frequency of the MFE structure along the in vitro 
selection process (FME), is graphically depicted in Figure S5 for 
the 188 different, 64 nt-long sequences studied. Groups of sequences 
were first detected at round VIII, and the process subsequently led to 
the appearance of two major groups: i) Group 1, represented by the 
folded sequence XIV26-6 and characterized by a low ED and high 
FME; and ii) Group 2, represented by XIV22-23, showing a high ED 
and low FME. 

Average values of the Hamming distance between sequences as 
well as the base-pair distance between structures along the process 
are depicted in Figure 2, in parallel to the evolution of the extension 
of the binding reaction B^^^. A trend towards lower average distance 
between all pairs of sequences and structures in the population 
becomes apparent from round IX to XIV. Despite the large disper- 
sion of these measures, the increase in similarity among groups of 
molecules is genuine, as information on the full distributions of pair 
distances (Figs. S2, S3 and S4) show. Based on the base-pair distance 
values, a clustering of the secondary structures corresponding to the 
188 sequences is shown in Figure S6. 

Design of the RNAptl6 and in silico analysis of the aptamers-target 
interaction. The in silico predicted MFE secondary structure model 
of the aptamers selected in round XIV is depicted in Figure 4A-C. 
Folding energies corresponding to each structure (in Kcal/mol) are 
AG[XIV22] = -10.80; AG[XIV1] = -11.80; AG[XIV32] = 
-10.80; AG[XIV12] = -10.90; AG[XIV26] = -15.70; AG[XIV5] 
= -11.10; AG [XI V48] = -9.80; AG[XIV37] = -7.32; AG[XIV25] 
= —11.50. The most abundant structures (belonging to the Groups 1 
and 2 defined above) indicate that the consensus octamer 5'- 
GGCAAGGA-3' is always placed in an apical loop flanked by 
complementary sequences that form a double stranded region of 
at least 4 bp in length. The RNAfold algorithm calculates the 
likelihood that any nucleotide in a sequence actually occupies the 
predicted structural position by analysing how often it appears in 
the structural ensemble of the analysed sequence. While some of the 
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3 ' GDU |CCG00CGAAAU| AAC5 ' poly (A) apical loop 

S'GGGAAUDCAACAACUACCAAUAGG^^^^CUA-AUGGAGUGAUCUGADACUACGAGCUCGACS' XI30 

5 ' GGGAAnnC AACCACCUCCUAGUG| aW«f«»JCU^ ACUAUGGAG0GADCUGADACUACGAGCUCGAC3 ' XI70 

5 ' GGGAAUnCAAUUACCUCCGGGACGCUCACCfl ^«PgJ4 UGGAGUG |AUCUGA| UACUACGAGCUCGAC3 ' XI 63 

5 ' GGGAAUUCAACAACACUUAUCGAC^CCgGe^^AUGGAGUGAUCUGAUACOACGAGCUCGACS ' XI15 - 

XIV25 5 ' gggaauucaacacuacucuacggcucgaac ^^^j^ ugg/ ^guga| ucu (ga!ja| cuacgagcucgac3 ' 

5 ' GGGAAUUCAACAACACUACUGACACUGUA- |^«f»>JA| UGGAGUGAUCUGAUACUACGAGCUCGAC3 ' XH05 

5' GGGAAUUCAAAACACCUCCUCCAGC^^^^ejCA-AUGGAGUGAUCUGAUACUACGAGCUCGACS' XI142 



3' OCGgGGGU^CGAS' TAR apical loop 

XIV48 5' GGGAAUDCAAjACCACAACGG qUAAa«RT5iyg CCCAAUGGAGU qAUCUGA| UACUACGAGCUCGAC3' 

5 ' GGGAAUUCAAGGAGCACCACUUGGI] |a«»;T«ilfc CCA-AUGGAGu qAUCUG'^ UAC0ACGAGC0CGAC3 ' XI134 

5' GGGAA0UCAAUCUGCUCCGCCGGi;(^^^^ACCAAUGGAGUGAUC0GAUAC0ACGAGC0CGAC3' XI20 

5 ' GGGAAUUCAACA |C»J«WJU| UGUUG qCAAGG^ GUAAUGGA |GUGAp CUGAUACUACGAGCUCGAC3 ' XI129 

5 ' GGGAAUUCAAUCUACUAGCCACGCCG l^^iWi^CA/^ UGGAGUC ^MJClJG^ UACUACGAGCUCGACS ' XI103 

5' GGGAAUUCAACAACACUUAUCGAC fi^^ GUCjC^GAUGGAGUGAUC0GAUACUACGAGCUCGAC3' XI15 

5' GGGAAUUCAACAACGACAUGGC pffWtTmrglAC| GCCAAUGGAGUGAUCUGAUACUACGAGCUCGAC3' XI3 



3 ' DGA|GUG|UCA5 ' SD apical loop 

XIV37 5 ' GGGAAUUCAACACUACCGACCGUCCACACCfl (GCCAA| 0GGAG0GAUCUGAUAC0ACGAGCUCGAC3' 

5' GGGAAUUCAACACGAUAGGAACAACACfl |AGAAACA?j UGGAGUGAUCUGAUACUACGAGCDCGAC3' XI73 X 3 

5 ' GGGAADUCAACACGAUAGGAACAACACfl |AG"AAGCAA| UGGAGUGAUCUGAaACUACGAGCDCGAC3' XI65 

5 ' GGGAAUU qAAACACUACUACG| GAACUGCCUGAGCAAUGGAGUG jMcUGA| UACUACGAGCUCGAC3 ' XI117 

5 ' GGGAAUUC jAACq GACGCCCUCCUGCUGCAAGCCC-AUGGA |GUGA| UCUGAUACUACGAGCUCGAC3' XIllO 

5 ' GGGAAUUCAACAACACCUGACCACAACUA- [AGACAA| UGGAGUGAUCUGAUACUACGAGCDCGAC3' XI122 

Figure 3 | Aptamer sequences of rounds XI and XIV, grouped attending to putative common targets. The variable region is shown in boldface. The 
single-stranded sequences predicted by RNAfold software^' are boxed. White text represents the sequences complementary to 5'-UTR loop regions 
(shown below each group), x N: sequence multiplicity in each round. Among these putative interaction sites, once the energy of the folded configuration is 
considered (Table S2) the only robust and energetically favoured interaction is that with the poly(A) apical loop. 



motifs present in the folded sequences have low reliability, we 
observed that the consensus octamer forming a loop plus its 4 nt 
flanking regions consistently shows a stem-loop configuration. Such 
an RNA motif is systematically found in most structures of the 
thermodynamic ensemble of sequences in round XIV (Figure 4A- 
B), and does not contribute significantly to their observed ED. The 
consensus sequence of that conserved motif, with structure 5'- 

(((( ))))-3', is 5'-NNDYGGCARGGARNNN-3' (sequence 

alignment not shown). Based on this fact, a 16 nt-long stem-loop 
RNA molecule was in silico designed as a minimal aptamer 
potentially able to interact with the 5'-UTR of HIV-1. This apta- 
mer was termed RNAptl6, and included the consensus octamer in a 
loop closed by the 4 bp-long stem allowing the highest possible 
thermodynamic stability of the folded molecule, thus formed by 
four consecutive C-G base pairs: 5'-CCCCGGCAAGGAGGGG-3'. 



The secondary structure of RNAptl6 is shown in Figure 4D, the 
folding energy associated to its MFE is —6, 50 kcal/mol, and the 
frequency of this MFE within the thermodynamic ensemble is 
91,82%. 

We used RNAup to predict the preferred sites of interaction 
between either three selected aptamers (at round XIV) or the min- 
imal engineered aptamer (RNAptl6) with the target RNA. An 
example of the information yielded by RNAup can be seen in 
Figure S7. The regions of the target molecule with the highest prob- 
ability of interaction with the aptamers are listed in Table S2. Among 
them, the binding site (a), placed at the poly(A) domain (Fig. I), 
shows by far the most stable interaction. 

Design of the molecule RNAptl6neg as a negative control. The 

design of a negative control molecule to evaluate the activity of 



SCIENTIFIC REPORTS | 4:6242 | DOI: 1 0. 1 038/srep06242 



4 




Figure 4 | In silico prediction of the MFE secondary structure of the 
aptamers. (A) Group 2. (B) Group 1. (C) Rest of structures. (D) RNAptl6 
and RNAptl6neg. _N: number of repetitions of a particular sequence. The 
16 nt-long stem-loop motif is boxed. Probabilities for every nucleotide to 
actually hold the structural position shown are represented by a colour 
code from deep blue (lowest) to red (highest). 

RNAptl6 was computationally challenging. In order to do so, we 
extensively folded all sequences 5'-CCCCNNNNNNNNGGGG-3' 
using RNAfold and kept those which had as their MFE the stem-loop 

structure 5'-(((( ))))-3'. Out of 4" = 65,536 sequences of length 

16 nt, 19,087 folded into the desired structure. For each of these 
sequences, we compared the frame of 8 unpaired, variable nucleo- 
tides in the trial molecules with a moving frame of 8 nucleotides along 
the target RNA. We investigated the possible antiparallel interactions 
(5'-3' for the target UTR308 and 3'-5' for negative control molecule). 
For each trial sequence, there are 302 possible interaction positions 
with the target. For each position of the moving frame, we counted: i) 
the total number of bases compatible for pairing; ii) the number of 
adjacent bases compatible for pairing. For each of the 19,087 
sequences, we kept the maximum value of these two magnitudes 
over 302 frame positions as the overall pairing number. 

Since we were interested in sequences with the structure of 
RNAptl6, but binding the target with the lowest possible probability, 
we first selected those sequences with minimum overall pairing num- 
bers. As a result, we obtained 112 sequences which had: i) a max- 
imum of 5 possible pairing events in an 8 nt-fi^ame along the target 
RNA; ii) a maximum of 4 possible adjacent pairs within the frame. 
Using RNAup, we obtained for all these sequences their total inter- 
action energies, taking into account the two individual structures. The 
result showed interactions between 2 and 1 1 pairs of nucleotides and 
total free interaction energies between —0.40 and —7.20 kcal/mol. 

After analyzing the 112 sequences we chose as the negative con- 
trol, termed RNAptl6neg, the molecule 5'-CCCCGAAAACAA- 



GGGG-3' (Fig. 4D) which has the following properties: i) it is the 
sequence of lowest total interaction energy with the target 
( — 0.40 kcal/mol); ii) its interaction with the target is produced only 
in 2 nt (positions 5-6 of RNAptl6neg and positions 22-23 of the 
target); iii) these interaction positions are in the loop of molecule 
RNAptl6neg. 

In vitro testing of the aptamer binding to the HIV-1 UTR308 
molecule. Binding to the HIV-1 target UTR308 was analyzed for 
the two most abundant aptamers in populations XI and XIV (repre- 
sented by sequences XIV22 and XIV26), and for RNAptl6. Since 
further inhibition assays were performed using fused aptamer-U6 
snRNA cassette molecules, we carried out binding assays with these 
chimeric molecules, termed LXIV22 and LXIV26. RNAptl6 was 
assayed without any modification for both binding and inhibitory 
assays, as we were interested in studying the functional properties of 
the minimal aptamer. The interaction of the RNAptl6neg molecule 
with the target was also assayed as a negative control. The binding 
efficiency was analyzed by gel electrophoresis mobility shift assays 
(Fig. 5). Multiple complex conformers were observed for the LXIV22 
aptamer RNA, whereas a major complex conformer was obtained for 
LXIV26 (Fig. 5A) and RNAptl6 aptamers (Fig. 5B). No interaction 
was detected for the control molecule RNAptl6neg, thus demon- 
strating the usefulness of our in silico approach for designing such 
a negative control. Results obtained from three independent 
experiments were quantified and fitted to a hyperbolic one-site 
binding curve with R'^ coefficient higher than 0.99 for aptamers 
LXIV26 and RNAptl6, yielding a Ka of 82 ± 13 nM and 280 ± 
60 nM, respectively. Binding of the aptamer LXIV22 responded to 
a one-site specific binding curve with HUl slope, also with R^ 
coefficient higher than 0.99 and with a Kj of 154 ± 5 nM. 

Efficient inhibition of viral particle production by the selected 
aptamers. The effect of the two most represented in vitro selected 
aptamers, XIV22 and XIV26, on HIV-1 production was assayed. 
Unmodified, pre-synthesized aptamers gave no satisfactory results 
(data not shown), probably due to their limited half-life or incorrect 
cellular location. Thus, we decided to flank them with the 5' and 3' 
stable hairpin-loop domains of the human U6 snRNA"" (Fig. 6A), 
thus giving rise to the modified aptamers termed LXIV22 and 
LXIV26. Aptamer sequences were cloned into vector pU6''* and 
templates for in vitro transcription of aptamers LXIV22 or LXIV26 
were prepared from the resulting plasmids. HEK 293T cells were co- 
transfected with 500 ng of in vitro transcribed LXIV22 or LXIV26 
RNA molecules and 100 ng of plasmid pNL4.3 (containing the HIV- 
1 NL4.3 proviral DNA). Viral particle production was measured as 
p24 antigen abundance in the cellular supernatant, yielding 
inhibitions of 77 ± 7% and 80 ± 3%, respectively, in the HIV-1 
viral particle production with respect to the control RNA termed 
'L-empty', that was transcribed from a pU6 plasmid without any 
cloned aptamer sequence, thus consisting in the two flanking U6 
terminal 5'- and 3 '-hairpins (Fig. 6B). 

To validate the in silico shortening of the in vitro selected apta- 
mers, inhibition of HIV-1 production by the designed RNAptl6 was 
also tested. HEK 293T cells were co-transfected with 500 ng of chem- 
ically synthesized RNAptl6 and 100 ngofpNL4.3. Inhibition of viral 
particles production of 85 ± 5% with respect to the synthetic 
RNAptl6neg control molecule, which did not showed inhibitory 
effect, was observed (Fig. 6C). This result clearly points to the useful- 
ness of the combined in vitro - in silico approach to design and 
optimize novel anti-HIV-1 agents. 

Discussion 

Anti-HIV-l RNA aptamers have been described to inhibit viral rep- 
lication by interacting with different viral or cellular targets''''"^^. In 
the present work, 64 nt-long RNA aptamers directed against the first 
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308 nt of the 5 ' -UTR of the HIV-1 genomic RNA have been selected 
in vitro. Based on our experimental results and on previous reports 
on the computational study of the SELEX outcome''°'''\ we have 
analysed in silico the RNA sequences and structures obtained along 
the process. Interestingly, a conserved, 16 nt-long sequence-structure 
motif present in most of the aptamers that target the viral RNA has 
been identified. The in silico designed RNAptl6 aptamer showed a 
specific binding to the UTR308 RNA in vitro, and it efficiently inhib- 
ited the production of HIV- 1 viral particles in transient transfection 
of HEK293T cell cultures. To our knowledge, this RNAptl6 molecule 
is the shortest anti-HIV-1 efficient aptamer described to date. 
Actually, conventional SELEX procedures impose a minimal length 
to the selected molecules and to the minimal aptamers obtained by 
sequence trimming'^'''*'''*, thus compromising the exploitation of 
aptamers as therapeutic agents. This limitation is here overcome 
by means of a combined in vitro - in silico approach. 

The 16 nt-long, highly conserved stem-loop motif shared by most 
of the selected aptamers (in particular, the abundant sequences 
represented by XIV22 and XIV26) served as a guide for designing 
the RNAptl6 molecule, together with RNAptl6neg as a negative 
control. Binding of the RNAptl6 molecule to the HIV-1 5'-UTR 
resembled the XIV26 binding behaviour with a single aptamer:target 
complex and fitting to a hyperbolic binding curve, while XIV22 
showed different aptamer:target complexes and a sigmoidal binding 
curve. However, RNAptl6 produced a less efficient binding with 
respect to that of aptamers XIV22 and XIV26, evidencing a contri- 
bution to the overall binding of the rest of the complete aptamer 
molecules apart from their 16 nt-long motif 



There are several quantitative properties that characterize the 
higher complexity of the selected, fuU-length aptamers in compar- 
ison to RNAptl6. The analysis of the ensemble diversity (ED) and the 
frequency of the minimum free energy (FME) structure showed that 
two groups of aptamers were segregated from round IX on, with each 
of the two most represented molecules in round XIV belonging to 
one of these subpopulations (Figs. SI and S5). The subpopulation 
represented by aptamer XIV26 (group 1) hadlowED (3.12) and high 
FME (33.22%), while that represented by XIV22 (group 2) showed 
the opposite behaviour (ED = 11.65; FME = 6.07%). This obser- 
vation eventually stems from the difference in stability of the two 
corresponding folded configurations (Table SI, Figs. 4A, 4B and S8). 
It is plausible that the structural plasticity of aptamer XIV22, with a 
fold of low stability outside the conserved, relevant interaction site, 
may promote multiple binding complexes, whereas aptamer XIV26 
exhibits a robust secondary structure that likely entails a predom- 
inant binding complex. This possibility is in agreement with the in 
vitro results of the aptamer binding to the HIV- 1 UTR308 molecule, 
where the participation of extra regions present in aptamer XIV22 
might be responsible for the formation of multiple aptamer:target 
complexes and a binding curve with different topology with respect 
to that of XIV26 and RNAptl6 (Fig. 5). Nonetheless, the dominant 
aptamer-target interaction is always that occurring at the consensus 
octamer 5 ' -GGCAAGGA-3 ' , located at the apical loop of a structural 

motif 5'-(((( ))))-3' within the aptamer. It is interesting that the 

observation of other potentially binding regions is conditional on the 
folded structure of the aptamer. Such is the case of region (a' ) in Fig. 1 
(see also Table S2), which interacts with a hexamer found in all 
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assayed aptamers, since it belongs to the 3 ' common flanking region 
(nucleotides 55-60): while in XIV22 and XIVl that interaction is 
observed, and it occurs with a relatively high binding energy, we have 
not found it in XIV26. This observation is compatible with a dynamic 
(low folding energy) configuration in nucleotides 55-60 of the for- 
mer two aptamers (Fig. 4A), while in the latter the breakage of the 
highly stable stem (Fig. 4B) followed by the binding to region (a') is 
not favoured energetically and does not take place. In agreement with 
this result, we have checked that aptamer XIV5, characterized by a 
significantly lower folding energy in that domain, may indeed bind to 
(a') (data not shown). 

The HIV- 1 inhibition efficiency of the aptamers seems to decrease 
when different competing interaction sites are present in the mole- 
cule, thus constituting an additional advantage of RNAptl6. As a 
matter of fact, while RNAptl6 alone inhibits the production of viral 
particles up to 85%, inhibition by the in vitro selected 64 nt-long 
aptamers was only produced (up to 75%) when fused to the U6 
snRNA flanking hairpins (Fig. 6 and data not shown) that increase 
their molecular length in more than 60 nt. Indeed, we cannot discard 
that the addition of the flanking hairpins had a direct effect on the 
aptamer-target interaction. 

The preferred target site of the selected aptamers and the RNAptl6 
molecule was the 5'-CUUGCC-3' sequence (nts 81-86 of the HIV-1 
genome) exposed in the apical loop of the essential poly(A) domain 
(Fig. 1). This hexamer is highly conserved among all the HIV-1 
strains, subtypes, circulating intersubtype recombinant forms and 
groups, and even among the closely related simian immunodefi- 
ciency virus (SIV) from chimpanzee" '" (Fig. S9 and data not shown). 
This sequence conservation makes interesting to analyze the inhib- 
itory potential of the selected aptamers and RNAptl6 against differ- 



ent clinical HIV-1 isolates belonging to distinct viral subtypes and 
groups. Although an identical poly( A) domain is present at both ends 
of all intracellular genomic and subgenomic HIV-1 RNAs, only the 
poly(A) at the 3' end leads to polyadenylation of the HIV RNAs''^. 
Therefore, we hypothesize that the achieved inhibitory effect of the in 
vitro selected aptamers and the in silico derived one (RNAptl6) 
might be explained as a result of the interference with the proper 
3' end RNA polyadenylation (Fig. 7). 

However, we cannot discard other inhibitory effects derived from 
aptamer binding to the 5' poly(A) domain. The binding of the apta- 
mers to the 5' end poly(A) domain might additionally interfere with 
translation, reverse transcription, RNA dimerisation and encapsida- 
tion, by affecting the folding of neighbouring domains within the 
5'UTR'''''' ''I The interference with the later two processes may result 
in the promotion of aberrant genomic-subgenomic RNA heterodi- 
mers through 5'-UTR reshape, yielding viral particles that are 
incompetent for infection (Fig. 7). Additionally, aptamer binding 
might interfere with the previously described pseudoknot involving 
the 5' poly( A) -targeted nucleotides and a downstream gag ORF 
region, whose function remains unknown'"'. In any case, we provide 
evidences that interfering with the poly(A) domain may effectively 
challenge the successful completion of the HIV-1 cycle. 

Our results underline the usefulness of the engineered RNAptl6 
aptamer as an active inhibitor of HIV-1. RNAptl6 includes a 4 bp- 
long stem entirely composed of G-C pairs that could protect the 
molecule against 5 '-end degradation, as U6 snRNA hairpins may 
do in LXIV22 and LXIV26 aptamers. Indeed, uncapped and unpo- 
lyadenylated natural RNAs protect from exonucleases degradation 
by trapping their ends into G-C rich hairpins (e.g. naked RNA 
viruses, tRNAs and 5S rRNA"*''''''^ and such a protection is usually 
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essential to increase the half-life and, thus, the efficiency of inhibitory 
RNAs for therapeutic applications^'*'^^'^^. The remarkably small size of 
RNAptl6 and its clear inhibitory effect without the need of further 
modifications makes of this poly( A) domain-binding molecule a very 
promising candidate for the development of anti-HIV strategies. A 
putative increase of the stability of either the in vitro selected apta- 
mers or the in silico designed one will be explored by chemical mod- 
ifications. Finally, the eventual selection of resistance mutations to 
the reported aptamers will be subject of further investigation. In any 
case, the combination of RNAptl6 with other aptamers or with 
distinct inhibitory RNA molecules (e.g. siRNAs) of different specifi- 
city might help delaying the generation of escape HIV-1 mutant 
variants, as it has already been shown in mouse models^". 
Altogether, our results exemplify the applicability of an in vitro - in 
silico combined approach for the design and optimization of efficient 
anti-HIV- 1 aptamers, useful as either drug candidates or diagnostic 
tools. 

Methods 

DNA templates and RNA synthesis. The HIV-1 5'-UTR RNA target molecule 
(termed UTR308 since it spans the first 308 nts of the genomic sequence) was 
synthesized by in vitro transcription of a PCR-amplified DNA template obtained 
from the pNL4.3 plasmid as previously described^\ In turn, the pU6-based eukaryotic 
RNA expression vectors were obtained by cloning the aptamer coding sequences 
XIV22 and XIV26 within the Kpnl and Apal restriction sites of vector pU6'^. The 
aptamer- coding fragments were obtained by PGR amplification of the corresponding 
pGEM-T®easy (Promega, Madison, US-WI) aptamer-coding plasmid, using primers 
5'KpnIC3 (5'-CGACTCGGTACCGGGAATTCAA-3') and 3'ApaIC3 (5'- 
TCTGGGCCCGTCGAGCTGGTAGTATC-3'). The resuhing plasmids were called 
pU6-LXIV22 and pU6-LXIV26, and led to the synthesis of the 128 nt-long LXIV22 
and LXIV26 RNA molecules, composed of the corresponding RNA aptamer flanked 



by the 5' and 3' hairpins of the human U6 snRNA. RNA molecules LXIV22 and 
LXIV26 were obtained by in vitro transcription of the corresponding PGR templates 
as previously described- \ 

In vitro selection of aptamers. RNA aptamers were generated using a SELEX 
procedure against the HIV-1 molecule UTR^os- The starting population of 84 nt-long 
DNA molecules consisted of 25 randomized nts flanked by constant sequences, and it 
was obtained by annealing and extension of 7.5 nmol of each of the oligonucleotides 
5'EcoRIK (5 ' - GG ATAATAGG ACTG AGT ATA GGG AATTGAA-3 ' ) and 
3'RANDOMK (5'-GTGGAGCTGGTAGTATCAGATCACTGCATN25TTG- 
AATTGGGTATAGTG-3'), where the underlined sequence corresponds to the T7 
RNA polymerase promoter and 'N' stands for a randomized position. The amount of 
the oligonucleotide including the 25 nt-long random region was 4-fold higher than 
the theoretical number of different molecules {1.1 X lO^'^). The population of 64 nt- 
long RNA aptamers was in vitro transcribed as previously described^". 

The target RNA molecule UTR308 was internally biotinylated during transcription 
by adding biotinylated- UTP {Roche, Indianapolis, US-IN) to the transcription mix at 
0.106 mM flnal concentration {in the presence of 1 mM non -biotinylated UTP). This 
rendered the incorporation of, on average, one biotinylated -UTP residue per mole- 
cule. Two transcription reactions were performed in parallel at 37''G for 2 hours, and 
DNA templates were removed using RQl DNase (Promega, Madison, US-MA) at 
1 U/)ig DNA concentration, at 37''G for 30 min. AU transcription products were gel- 
purified and ethanol- precipitated. 

The target RNA was resuspended in 1 ml of binding buffer (150 mM sodium 
chloride, 20 mM sodium phosphate, pH 7.5) and renatured at 65''G for 10 min 
followed by an additional incubation at 37 "G for 10 min. Then, biotinylated UTR308 
was bound to a HiTrap^^' Streptavidin HP column (GE Healthcare, Ghalfont ST. 
Giles, UK) following manufacturer's instructions. The column was washed with 
10 ml of binding buffer and equilibrated with 10 ml of TMN IX buffer (20 mM 
TRIS-acetate; 10 mM magnesium acetate; 100 mM sodium chloride). The starting 
RNA aptamer population exceeded 40 |,ig weight, theoretically ensuring the presence 
of at least one copy of each of the possible sequence variants. Previous to the first 
selection round, a negative selection step was carried out to prevent selection of 
sepharose and/or streptavidin -binder molecules: for this purpose the initial RNA 
population was passed through an unloaded-sepharose column at 25^C and the 
unbound molecules were recovered. This RNA population was loaded into a target- 
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containing column in 1 ml of TMN 1 X , and incubated for binding at 25 C for 
30 min. Unbound molecules were discarded by washing with 10 ml of TMN IX at 
the binding temperature, and bound molecules were further recovered by elution with 
10 ml of TMN 1 X at 95 C. The first four 1 ml fractions recovered were concentrated 
using Centricon Ultracel YM-3 (Merck Millipore, Billerica, US-MA), and ethanol- 
precipitated. 

One half of the recovered RNA pool was reverse transcribed and amplified by Tth 
DNA polymerase {Promega, Madison, US-WI) using the primers 3'XhoIK {5'- 
GTCGAGCTCGTAGTATCAGATCACTCCAT-3') and 5'EcoRIK. The cDNA 
population was phenol- extracted, ethanol-precipitated and used as template for a new 
round of selection. Fourteen rounds of amplification- selection were performed, and 
the selection pressure was increased along the procedure as follows: i) binding tem- 
perature was 25' C for the rounds I to III, and 37 for the rounds IV to XIV; ii) the 
aptamer:target ratio was 1 : 1 for the rounds I to X, and 1000 : 1 for the rounds XI to 
XIV. The second half of the cDNA population of each round was cloned in E. coli 
using pGEM-T®easy vector, and a minimum of 24 molecular clones were sequenced. 

In silico methods for sequence analysis. For sequence comparison, Hamming 
distances between each pair of clonal sequences were calculated as the number of 
positions at which the corresponding nucleotides are different. Clustering of the non- 
repeated, 64 nt-long sequences was performed based on their mutual Hamming 
distances. An outgroup sequence was designed that contained the 10 nt and 29 nt- 
long constant sequences present at the 5' and 3' ends of all molecules, flanking an 
artificial 25 nt-long sequence 5'-(ACGT)6A-3'. Thus, the complete sequence of the 
outgroup was: 5'-GGGAATTCAAACGTACGTACGTACGTACGTACGTAA- 
TGGAGTGATCTGATACTACGAGCTCGAC-3'. The topology of the clustering 
was inferred by means of the neighbour-joining (NJ) method^^ using the program 
NEIGHBOR from the PHYLIP v3.6 package'-*. 

In silico methods for analyzing RNA structure. We used the Vienna RNA package^', 
version 1.5, to fold RNA sequences into their minimum free energy (MFE) secondary 
structure, as well as to compare the obtained structures {see below). The previously 
described standard parameter set'^ was used, allowing for A-U, G-C, and wobble G-U 
base pairs, and disallowing the formation of isolated base pairs. Three relevant 
quantities were computed for every folded structure, in addition to its folding energy: 
i) its ensemble diversity (ED); ii) the frequency of the MFE structure (FME) in the 
thermodynamic ensemble; iii) the ensemble centroid structure (CE). The ED is the 
average distance between all the possible secondary structures present in the 
thermodynamic ensemble, which is in turn defined as the set of all different structures 
(each characterized by its corresponding folding energy) compatible with a given 
sequence. Structures with similar energy in the ensemble are found with comparable 
frequency. If the energy of the MFE structure is significantly lower than the rest of 
structures in the ensemble, then the corresponding sequence wUl fold most of the time 
in the MFE structure, and it wUl thus be much more frequent. The centroid of a set of 
structures is the structure that has the minimum total base-pair distance to the 
structures in the set, thus being the single structure that best represents the set as a 
whole''. For representation of RNA secondary structures the dot-bracket notation 
was used, where unpaired nucleotides are denoted by dots, and paired nucleotides by 
parentheses: '(' indicates that the partner is downstream, and ')' that the partner is 
upstream. 

Structure comparison was performed by means of base-pair distances, which 
measure the number of base pairs that must be opened and closed in order to convert 
one structure into the other. We checked that structure comparison through 
Hamming or tree-edit distances yielded qualitatively equivalent results. The 
Hamming distance between two folded sequences (that must be of equal length) is 
defined as the number of positions at which the structural states ['.', '(' or ')'] of the 
corresponding nucleotides differ. In turn, the tree-edit distance compares structural 
elements (e.g., hairpins, bulges or stems) and allows for a variable number of 
nucleotides involved in those motifs or separating them. It is slightly more compu- 
tationally complex since it was devised to compare sequences of different length-'^. 

A cluster analysis of the structures was performed, based on their base-pair dis- 
tances. An artificial outgroup structure was designed, showing the maximum number 
of paired bases in their 64nt-longsequence:5'-((((((((((((((((((((((((((((((....))))))))))- 
))))))))))))))))))))-3'. Clustering was performed as described above. 

To predict those sites where interaction between aptamer and ligand is favoured, 
we used the RNAup algorithm from the Vienna RNA package, version 1.8.4'^. RNA- 
RNA binding is decomposed in two steps: first, the probability that a sequence 
interval (e.g. a binding site) remains unpaired is computed; in a second step, the 
binding energy provided that the binding site is unpaired is calculated as the optimum 
over all possible types of bindings^". Parameters are chosen as for RNAfold. In par- 
ticular, isolated base pairs are not allowed and the length of the unstructured regions 
is 4 nts. 

In vitro aptamer binding assays. The evaluation of the binding efficiency of the 
selected aptamers was essentially performed as previously described'", using trace 
amounts of renatured, 5' end ^^P-labeUed aptamers and 0, 2, 20, 200 or 400 nM of 
non-labelled UTR308 target RNA. 

Results obtained from three independent experiments were quantified and fitted to 
either a hyperbolic one-site binding curve (Equation 1) or a one-site specific binding 
curve with Hill slope (Equation 2): 



Y-Bn,,,-X/(Kd + X) (1) 

Y-B^ax-xV(Kd^ + X^) (2) 

where B^^^ is the maximum value of Y (when X ~ c^); is the value of X when Y ~ 
Bmax/2; and h is the HiU slope and indicates the degree of cooperativity. 

HIV- 1 inhibition assays. The viral production inhibition assays were performed by 
transient transfection of human embryonic kidney (HEK) cells as previously 
described", using 100 ng of plasmid pNL4.3 plus 500 ng of RNA aptamer in co- 
transfection experiments^^. 
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